Semi-Supervised Multimodal Deep Learning for RGB-D Object Recognition
نویسندگان
چکیده
This paper studies the problem of RGB-D object recognition. Inspired by the great success of deep convolutional neural networks (DCNN) in AI, researchers have tried to apply it to improve the performance of RGB-D object recognition. However, DCNN always requires a large-scale annotated dataset to supervise its training. Manually labeling such a large RGB-D dataset is expensive and time consuming, which prevents DCNN from quickly promoting this research area. To address this problem, we propose a semi-supervised multimodal deep learning framework to train DCNN effectively based on very limited labeled data and massive unlabeled data. The core of our framework is a novel diversity preserving co-training algorithm, which can successfully guide DCNN to learn from the unlabeled RGB-D data by making full use of the complementary cues of the RGB and depth data in object representation. Experiments on the benchmark RGB-D dataset demonstrate that, with only 5% labeled training data, our approach achieves competitive performance for object recognition compared with those state-of-the-art results reported by fully-supervised methods.
منابع مشابه
Learning Deep Representations, Embeddings and Codes from the Pixel Level of Natural and Medical Images
Significant research has gone into engineering representations that can identify high-level semantic structure in images, such as objects, people, events and scenes. Recently there has been a shift towards learning representations of images either on top of dense features or directly from the pixel level. These features are often learned in hierarchies using large amounts of unlabeled data with...
متن کاملLow-Shot Learning for the Semantic Segmentation of Remote Sensing Imagery
Recent advances in computer vision using deep learning with RGB imagery (e.g., object recognition and detection) have been made possible thanks to the development of large annotated RGB image datasets. In contrast, multispectral image (MSI) and hyperspectral image (HSI) datasets contain far fewer labeled images, in part due to the wide variety of sensors used. These annotations are especially l...
متن کاملSemi-supervised Multimodal Learning with Deep Generative Models
In recent years, deep neural networks are used mainly as discriminators of multimodal learning. We should have large amounts of labeled data for training them, but obtaining such data is difficult because it requires much labor to label inputs. Therefore, semi-supervised learning, which improves the discriminator performance using unlabeled data, is important. Among semi-supervised learning, me...
متن کاملImproved RGB-D-T based face recognition
Reliable facial recognition systems are of crucial importance in various applications from entertainment to security. Thanks to the deep-learning concepts introduced in the field, a significant improvement in the performance of the unimodal facial recognition systems has been observed in the recent years. At the same time a multimodal facial recognition is a promising approach. This paper combi...
متن کاملCorrelated and Individual Multi-Modal Deep Learning for RGB-D Object Recognition
In this paper, we propose a correlated and individual multi-modal deep learning (CIMDL) method for RGB-D object recognition. Unlike most conventional RGB-D object recognition methods which extract features from the RGB and depth channels individually, our CIMDL jointly learns feature representations from raw RGB-D data with a pair of deep neural networks, so that the sharable and modalspecific ...
متن کامل